library(knitr)
opts_chunk$set(echo=FALSE, warning=FALSE, message=TRUE, cache=TRUE)#, fig.width=12, fig.height=8, fig.path='Figs/')
options(digits=3)
seed <- 1859; set.seed(seed)
suppressPackageStartupMessages(library(mvtnorm))

In this document I will compare the results of the analysis run on the ET and EE model with and without the singleton and ‘full’ configuration. Recall that we performed this analysis by computing the maximum t statistic across tissues for each SNP-gene pair, and then using the SNP which yielded the maximum value of this statistic within a gene’s cis region. This resulted in 16,407 Genes. Let us begin by considering the EE model, in which we assume that each \(\beta_{j}\) ~ N(0,\(\sigma_{EE}\)). In this document, I refer to the config or bma model as the model in which we estimate the posterior effect as a mixture over 1242 componenets which represent:

  1. The Identity Matrix
  2. \(X'X\), the empirical sample covariance matrix of T statistics.
  3. \(Vd^2V\), the rank 2 singular value decomposition of the sample covariance matrix, where \(UdV'\) represents the Singular Value Decomposition of the column centered matrix of T statistics. 4-8) \((LF)'(LF)\) - 5 rank 1 approximations resulting from the SFA decomposition of the matrix of t statistics.
  4. \((LF)'(LF)\) resulting from the rank 5 SFA decomposition of the matrix of t statistics 10-53 Rank1 [ 1 0 0 0 0 …] etc ‘singleton’ configruation representing covariance matrices of tissue-specific effects
  5. The ‘full’ model representing the prior covariance matrix which would specify effects of equal size in each tissue ([1 1 1 1 1]).

Because we choose 23 \(\omegas\) representing the stretch along the directional vectors specified above, this results in 1242 (54x23) componenets in the full case and and 207 (9x23) in the bma-free case.

## Loading required package: bigmemory.sri
## Loading required package: BH
## 
## bigmemory >= 4.0 is a major revision since 3.1.2; please see packages
## biganalytics and and bigtabulate and http://www.bigmemory.org for more information.

First, we should assure ourselves that the posterior mean computation is correct for each componenet and in computing the total weighted posterior mean. We show that the diagonal of the posterior covariance matrix computed for a particular componenet matches that stored in the \(JxKxR\) array, similarly that the \(R\) element posterior mean vector computed for the \(kth\) componenet matches that stored in the \(JxKxR\) array, and that the dot product of the the \(k\) element posterior weight vector for a parricular SNP and the \(KxR\) matrix of posterior posterior means matches the posterior means stored in

Now, we load the weighted posterior quantities and can compare between models.We might first want to consider the distribution of lfsr in each model. The maximum lfsr achieved in the \(ee.bm\) - that is, the model which included all singleton and full configurations - will return some \(lfsr\) equal to 0 as there is non trivial prior weight on each of the singleton configurations for which the posterior variance is 0. Thus for a gene snp pair that truly reflects tissue specific effects, the lfsr for all tissues other than the tissue in which the strong association occurs may be 1 because the posterior weight assigned to the singleton configuration in which it operates may occupt the majority of the distribution. However, the use of the tissue-specific configuration also allows us to extract lower lfsrs for cases in which the gene-snp pair truly has a strong tissue specific effect as the assocation can put the majorty of the posterior weight on a componenet for which we are \(confident\) in the sign (i.e., a large maximum tail probability due to low posterior variance in the particular tissue and or a larger poterior mean)

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL

Consider this example, where the posterior mean estimate using the singleton configurations ‘shrinks’ all the tissues with lfsr greater than 0.05 in the non-bma case to 0. First we plot using the posterior mean estimates from our mixture which included the singleton and ‘full configurations’.

And then we consisder the posterior mean in the model which didn’t include these configurations.

If one considers the distribution of psoterior weights in this case, they see that the majority of the posterior weight is put on the singleton config reflecting activity in only this tissue.

## [1] "k=27;l=20"

## [1] "k=9;l=9"

We can see that even though the max prior weight is on U_k = 2 (the sample covariance matrix), the max weight for the individual is at the tissue specific componenet.

## [1] "k=2;l=17"

## [1] "k=9;l=8"

Thus we see that both models assigned maximum weight to the k=2 componenet (i.e., the sample covariance matrix) but the tissue specific SNPs assigned maximum responsibility to the tissue-specific configuration componenet in the BMA case. We might also want to consider the reverse example - i.e., a case in which the lfsr is low across all tissues. We would expect that our estimate of the posterior means using the model with BMA will tend to push these estimates together (because of large ‘responsibility’ on the 1-1-1-1 componenet) when compared with a model that does not enforce these configurations.

## [1] "k=2;l=19"

Indeed, we see that in this case the maximum responsibility is assinged to the U_k=54 which is the [1,1,1,1,1 …] componenet. We can compare to the estimates that use only the 207 componenets without the configurations, which shows that the maximum responsibility is assigned at the component which considers the sample covariance matric (U_k =2)

## [1] "k=2;l=19"

Now, we can consider the correlation among lfsr and posterior mean estimates with both models. We would hope that the addition of the singleton configs would help us to identify tissue-specific effects when they exist, and thus reduce the correlation among posterior mean estimates between tissues.

library(gplots)
## 
## Attaching package: 'gplots'
## 
## The following object is masked from 'package:stats':
## 
##     lowess
U=read.table("~/matrix_ash/tissuenames.txt")[,1]

heatmap.2(cor(post.means.bma),labRow=U,
          labCol=U,main="Cor(PostmeanEE)UsingConfigs")

heatmap.2(cor(post.means),labRow=U,
          labCol=U,main="Cor(PostMeanEE)NoConfigs")

It might also be interesting to consider how the inclusion of the singleton and full configs adds to our understanding of the estimates of the posterior effect size. We would expect there to be increased tissue=specific resolution of the lfsr in the inclusions of configurations case because the truly tissue-specific effect size estimates will assign the majority of their posterior weight to the singleton configuration which suits them, thus causing this lfsr to be reduced at the expense of elevated lfsr in the alternative tissues.

## Loading required package: reshape2
## Loading required package: ggplot2

We would expect some non-zero lfdrs in the ‘with configs’ model, because in the tissue specific case the majority of the posterior weight may be on the singleton configurations. This further illustrates the sensitivity of the lfdr measurement to the selection of prior weights.

Lastly, we can consider the heatmaps of the lfsrs with and without the addition of configurations. We see that the addition of the configuration tends to shrink the tissue-specifc effects in the ‘off tissues’, thus resulting in increased LFSRs in the alternative tissues and reduced LFSRs in the tissue specific cases. (Plot not included for size reasons)

WE can also plot the heatmaps of the posterior means.

Similarly, we might want to plot the posterior mean estimates of associations which don’t necessarily show a large tissue specific or completely shared effect. First, we shos three examples not using BMA

And now we plot using BMA:

We can see that for associations which demonstrate variability across tissues, the difference between the two models is less evident.

We might also want to compare the posterior mean estimates all together: